Reliable Communication Infrastructure for Adaptive Data Replication
نویسندگان
چکیده
In this paper, we propose a data replication algorithm adaptive to unreliable environments. The data replication algorithm, named Adaptive Data Replication (ADR), has already an adaptiveness mechanism encapsulated in its dynamic replica placement strategy. Our extension of ADR to unreliable environments provides a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost. Performance evaluation shows that this original combination of two adaptive strategies makes it possible to ensure high request delivery, while minimizing communication overhead in the system. 1 Adaptive Data Replication Data replication is a well-known technique to increase data availability and load balancing. A data replication system can be characterized by two key policies: a replica placement policy, which determines how many replicas the scheme creates and where it places them, and a replica consistency policy, which determines the level of consistency the scheme ensures among replicas, e.g., eager consistency or lazy consistency. These policies are typically implemented on top of a communication substrate ensuring a set of properties necessary for the correctness of the data replication system. An example of communication substrate is the group communication abstraction [1, 5, 8]. In this case, the group communication offers a set of guaranties including adaptiveness to membership changes, message ordering, and multicast reliability. In this paper, we define as a communication substrate a routing mechanism adaptive to unreliable environment in order to use it as the basis for a replica placement solution. We are primarily concerned with replica placement; replica consistency is out of the scope of the paper. Regarding the replica placement policy, various replica management schemes have been proposed, based on a fixed number of replicas placed in fixed locations [2, 15, 13]. This approach works well when the source and the frequency of read and write requests are known in advance and remain static during the execution, which then implies that clients accessing the replicas are themselves static and generate a steady stream of requests. When the frequency and the source of requests are variable, however, the ability to dynamically create, move, and delete replicas is essential when it comes to devising efficient replication schemes. In a dynamic distributed environment, replica placement significantly affects the overall performance of the replication scheme. For example, since reading a replica locally is faster and less costly than reading it remotely, a widely distributed replication scheme is particularly well suited in read-intensive environments. On the other hand, writing to a large number of replicas may be slow and increase communication costs. For this reason, a narrowly distributed replication scheme is more adequate in write-intensive environments. In addition, the occurrence of node and link failures further challenges the effectiveness and performance of the replication scheme, as it can radically compromise replica placement decisions made before the failures occurred. The problem of placing replicas in dynamic and unreliable distributed environments advocates integrating adaptiveness into the replication schemes. To adapt to the dynamic behavior of the environment and the application access patterns, various solutions have been proposed in the literature [16, 11, 12, 19]. Among these, the Adaptive Data Replication algorithm (ADR) described in [19] is particularly interesting, as it was shown to be convergent-optimal with respect to communication costs. That is, as soon as the read-write access pattern changes, ADR adapts its replication scheme to minimize the communication cost caused by the routing of access requests. Intuitively, ADR organizes replicas as a connected graph, known as the replication scheme, which expands or contracts as the read-write access pattern changes. Unfortunately, this convergence towards optimality only holds under two strict conditions: (1) the network is organized as a tree—finding an optimal replication scheme was shown to be NP-complete for general topologies [20]— and (2) no process or link failures occur. Condition 1 implies that one must first build an overlay tree covering the network. Condition 2 implies that ADR ceases to work correctly as soon as a failure happens. Indeed, unreliable links may cause requests to be lost, thus misleading the replica placement strategy of ADR, while node failures may break the connectivity of the replication scheme, an essential assumption for ADR to work. Conditions 1 and 2 make ADR unsuitable to unreliable large-scale distributed environments. Contributions. In this paper, we propose an architecture that extends ADR to make it capable of dynamically reorganizing itself based on changes in the application access patterns, and on link and node failures. The new replica placement strategy relies on a specialized routing layer, which encapsulates our adaptive request routing strategy. The latter is based on a tree overlay that aims at maximizing the reliability of request routing, in spite of link and node failures. This tree, named the Maximum Reliability Tree (MRT), is a spanning tree containing the most reliable paths in the system [4]. Roadmap The remainder of this paper is organized as follows. Section 2 formally defines our model, describes and motivates the problem solved in the paper, and sketches the architecture of our solution. Section 3 presents our adaptive 3 In [19], processes switch to a special failure mode until recovery occurs. As detailed in Section 6, this approach is quite different from ours. request routing algorithm based on a spanning tree maximizing the reliability of communication paths, while Section 4 describes an extension of the adaptive replica algorithm defined in [19], which aims at minimizing communication costs given a read-write pattern. In Section 5, we evaluate the benefit of using our adaptive request routing solution in terms of performance and adaptiveness, when both the access pattern changes and failures occur. Finally, Section 6 puts the proposed approach into perspective by comparing it with the state of the art; Section 7 concludes the paper and discusses future work. 2 A Modular Approach to Adaptiveness In this paper, we consider an asynchronous distributed system composed of processes (nodes) that communicate by message passing. Our model is probabilistic in the sense that processes may crash and links may lose messages with a certain probability. More formally, the tuple S = (Π, Λ, C) completely defines the (unreliable) environment considered in this paper. With Π the set of processes and Λ is a set of bidirectional communication links. We only consider systems with a connected graph topology. Process crash probabilities and message loss probabilities are modeled as failure configuration C. We then define object o as the data to replicate, while R ⊆ Π denotes the replication scheme of o, i.e., the set of nodes holding a copy of o. Any request sent to R is either a read or a write operation. Given these definitions, our approach consists in addressing the two following questions. Adaptive Replica Placement. Given a pattern of reads and writes to o, what nodes should be part of R in order to minimize the communication cost? Adaptive Request Routing. Given some failure configuration C, how should read/write requests be routed to maximize reliable delivery, and thus provide the replica placement layer with accurate information? 2.1 Adaptiveness to Access Patterns reads via p2 > ∑ writes to p1 – writes via p2 15 10 10 10 10 + 10 +10 = 30 10 p1
منابع مشابه
An Adaptive-Grained Consistency Maintenance Scheme for Shared Data on Emergency and Rescue Applications
Efficient information sharing is difficult to achieve in the scenario of emergency and rescue operations because there is no communication infrastructure at the disaster sites. In general, the network condition is relatively reliable in the intra-site environment but relatively unreliable in the inter-site environment. The network partitioning problem may occur between two sites. Although one c...
متن کاملReliable Multicast in Heterogeneous Mobile Ad-hoc Networks
In disaster scenarios, communication infrastructure could be damaged or completely failed. Mobile Ad-hoc Networks (MANETs) can be used to substitute failed communication devices and thus to enable communication. As group communication is an important part in disaster scenarios, multicast will be used to address several nodes. In this paper, we propose our new reliable multicast protocol RMDA (R...
متن کاملReliable Data Distribution and Consistent Data Replication Using the Atom Syndication Technology
Abstract. Atom is a lightweight syndication technology, based on XML, that allows data to be published on, and retrieved from, the Web. Atom does not currently provide reliable data distribution or consistent data replication. In this paper we describe a novel Reliable Data Distribution and Consistent Data Replication infrastructure for Atom. Reliable Data Distribution ensures that the intended...
متن کاملA Survey of Dynamic Replication Strategies for Improving Response Time in Data Grid Environment
Large-scale data management is a critical problem in a distributed system such as cloud,P2P system, World Wide Web (WWW), and Data Grid. One of the effective solutions is data replicationtechnique, which efficiently reduces the cost of communication and improves the data reliability andresponse time. Various replication methods can be proposed depending on when, where, and howreplicas are gener...
متن کاملA New Framework for Increasing the Sustainability of Infrastructure Measurement of Smart Grid
Advanced Metering Infrastructure (AMI) is one of the most significant applications of the Smart Grid. It is used to measure, collect, and analyze data on power consumption. In the AMI network, the smart meters traffics are aggregated in the intermediate aggregators and forwarded to the Meter Data Management System (MDMS). The infrastructure used in this network should be reliable, real-time an...
متن کامل